Improved Topic Modeling in Twitter Through Community Pooling

نویسندگان

چکیده

Social networks play a fundamental role in propagation of information and news. Characterizing the content messages becomes vital for different tasks, like breaking news detection, personalized message recommendation, fake users flow characterization others. However, Twitter posts are short often less coherent than other text documents, which makes it challenging to apply mining algorithms these datasets efficiently. Tweet-pooling (aggregating tweets into longer documents) has been shown improve automatic topic decomposition, but performance achieved this task varies depending on pooling method.In paper, we propose new scheme modelling Twitter, groups whose authors belong same community (group who mainly interact with each not groups) user interaction graph. We present complete evaluation methodology, state art schemes previous models terms cluster quality, document retrieval tasks supervised machine learning classification score. Results show that our Community polling method outperformed methods majority metrics two heterogeneous datasets, while also reducing running time. This is useful when dealing big amounts noisy user-generated social media texts. Overall, findings contribute an improved methodology identifying latent topics dataset, without need modifying basic machinery decomposition model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Twitter Topic Modeling by Tweet Aggregation

Conventional topic modeling schemes, such as Latent Dirichlet Allocation, are known to perform inadequately when applied to tweets, due to the sparsity of short documents. To alleviate these disadvantages, we apply several pooling techniques, aggregating similar tweets into individual documents, and specifically study the aggregation of tweets sharing authors or hashtags. The results show that ...

متن کامل

Characterizing Twitter Discussions About HPV Vaccines Using Topic Modeling and Community Detection

BACKGROUND In public health surveillance, measuring how information enters and spreads through online communities may help us understand geographical variation in decision making associated with poor health outcomes. OBJECTIVE Our aim was to evaluate the use of community structure and topic modeling methods as a process for characterizing the clustering of opinions about human papillomavirus ...

متن کامل

Topic Modeling in Twitter: Aggregating Tweets by Conversations

We propose a new pooling technique for topic modeling in Twitter, which groups together tweets occurring in the same user-to-user conversation. Under this scheme, tweets and their replies are aggregated into a single document and the users who posted them are considered co-authors. To compare this new scheme against existing ones, we train topic models using Latent Dirichlet Allocation (LDA) an...

متن کامل

Online Topic Modeling for Real-Time Twitter Search

This paper discusses the work done by a team at the University of Florida for the TREC 2011 Microblog Track. To build a real-time microblog search engine we rely on topic modeling for our search. To facilicate our algorithms we bundle similar tweets together in what we call supertweet generation. We perform online inference and offline inference depending on the time frame of the topical query....

متن کامل

Assignment 2: Twitter Topic Modeling with Latent Dirichlet Allocation Background

In this assignment we are going to implement a parallel MapReduce version of a popular topic modeling algorithm called Latent Dirchlet Allocation (LDA). Because it allows for exploring vast document collection, we are going to use this algorithm to see if we can automatically identify topics from a series of Tweets. For the purpose of this assignment, we are going to treat every tweet as a docu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-86692-1_17